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THE EFFECTS OF PRAISE AND BLAME AS 
INCENTIVES TO LEARNING 


CHAPTER I 


Tue 


Introduction 


|= purpose of this investigation has been to study the effectiveness of 

praise and blame as incentives to learning; and further, to examine 
into the possible influence of different testers and emotional factors that may 
have been playing a part. The method adopted was to run parallel experi- 
ments with two testers, though for practical reasons there was no duplica- 
tion of the emotional investigation. There were two experimental groups 
in each case, a Praise and a Blame group, and a Control group. The 
answer to the question of the effectiveness of praise and blame was sought 
through a comparison of the gains made by each group from an initial test, 
when no incentives were administered, to a final test, where all groups 
received a common incentive. 

Virtually all of the previous experimental work that has been done with 
humans on incentives in learning, aside from being so varied as to leave 
nothing but confusion in its wake, has missed the vital point that in meas- 
uring the amount of learning taking place, it has been measuring output 
of some sort, rather than acquisition. 

This problem has been recognized in the infra-human studies by 
Finan (30), Leeper (55, 56), Lashley (54), Tolman (74), Tolman, Hall & 
Bretnall (75), Blodgett (15), and others. 

For example, Blodgett, working with control and experimental groups 
of rats running mazes, sought to study the efficiency of units of practice 
when unaccompanied by reward. The experimental group received no 
reward during the first part of learning, while the control group was 
rewarded throughout the whole of learning. His findings were, in brief: 


1. Rats run under a non-reward condition learned more slowly than rats 
run under a reward condition. This held both for errors and time. 

2. Rats previously run under non-reward conditions made a great improve- 
ment when suddenly rewarded. On the first day after the reward their 
drop in mean error was greater than that made by control on any single 
day. 
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3. This would seem to indicate that non-rewarded rats were developing a 
’ latent learning which they could make use of when the reward was 
introduced. 

q 


No such clear-cut picture exists in the human field, as a review of the 
literature shows. 


Historical 

Any real review of the history of motivation would demand far more space than 

can logically be assigned here, for the history of motivation is a history of emotions 

fl and instincts. This has been pointed out by Brenner (18), who presented a brief 

s résumé running from Aristotle to the date of his study, 1934, with detailed attention 
being given to the effects of comments as incentives. His bibliography lists twenty 

{ titles. He brings out the natural transition from the experiments with animals, with 

Thorndike the leading investigator, to experiments in the human field. 

ar Brenner’s review of the studies employing comments as incentives begins with the 

work of Kirby (48) in 1913, then discusses in order, Gilchrist (36), Chapman and 

Feder (21), Gates and Rissland (35), Hurlock (41, 42), Sims (69), Symonds and 

“a Chase (72), Deputy (26), and Maller (59). He finds generally that the shortcomings 

of these studies (in which we concur) may be listed as follows: 

1. Small number of cases. 

2. Inadequacy of the grouping of the subjects to make them comparable. 

3. No attempt to define and isolate an incentive as a single, —s variable. 

4. Inadequacy of the statistical treatment of the data. 

5. Interpretations not substantiated by statistical evidence. 

6. Unwarranted generalization. 


7. Undue expectations that a certain incentive active in a certain ‘way in a laboratory 
situation will act similarly in every school situation. 


He points out further that the age-range of the subjects (from 7-year-olds to adults), 
the varying experimental conditions, and the differences in the kinds of incentives, 
were all bound to bring about conflicting and incomparable results. However, all 
seem to agree (by implication, at least) that any change in the learning situation will 
.; bring about an increment in performance (this word has been italicized deliberately; 
+ it is crucial to the question and will be developed more fully presently). Brenner 
. ‘makes a plea for a clearer formulation which will provide a more definite goal to 
which research can be directed. 

It appears, indeed, that this latter has been one of the chief factors making for a 
confusion of issues and a lack of comparableness in the larger majority of the studies. 
a It is unfortunate also that the terminology is no more clear-cut than it is. Praise vs. 
‘i Blame, Reward vs. Punishment, Knowledge of Results vs. No Knowledge of Results, 

have in many instances been examining the same thing, or have been interwoven in 
the same work. These tetmis have been in large measure not nearly so definable as 
their connotations suggest; nor have they always been truly indicative of their investi- 
gation. For example, we might cite Binet and Vaschide (10), after Hurlock (45): 
Expérience de forces musculaires et du fond chez les jeunes garcons. Actually, this 
was an experiment in motivation with 43 school children, who were encouraged to 
break their previous scores. The encouragement, “Allons, tu peu faire micux que ¢4, 
toi,” always resulted in improvement. 

Regardless of the announced objective of an experiment, be it the effects of praise 
vs. blame, or bell-right vs. bell-wrong, the matter of knowledge of results plays a 
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a to say that there were four classes per teacher, nor of age or sex. From this he finds 
added evidence in favor of praise over blame. 

The first systematic review of incentives is found in Hurlock (45). She reviews 
> | both the infra-human and human fields. The analysis in the human field is grouped 
+t in keeping with the trend of investigations into the incentives used rather than the 
situation involved. These are: (a) knowledge of results, (b) praise and reproof, 
(c) rewards, (d) punishment, (e) the influence of an audience, (f) rivalry, (g) dis. 
traction, and (h) music. In the field of particular interest to us here, (b), she 
summarizes the work of Kirby (48), Binet and Vaschide (10), Scott (67), Ream (64), 
Briggs (19, 20), Laird (51, 52), Gilchrist (36), Gates and -Rissland (35), Hurlock (41, 
42, 44), Cohen (23), and Watts (80). She draws no conclusions, nor summarizes 
findings, but presents merely brief synopses of the studies. 

Diserens and Vaughn (27) present a comprehensive review of the field of motiva- 
tion, but add no further studies to those already mentioned above in the field of social 
motivation (praise and blame). They conclude (p. 52) by re-naming Hurlock (41) 
that praise is more effective than blame; that indifference is detrimental. They point 
out that social motivation requires an approach from the quantitative standpoint in 
- order to discover the effect of a group in various attitudes on the performance of other 
individuals or groups also varying in attitude. They also feel it necessary for investi- 
gation to be made of groups varying greatly in numerical range. They have posited 
Bt! several “laws” affecting motivation. These are: 


1. The number of motives available for experimentation increases with the complexity 


E and degree of development of the organism. 

y 2. The energy of a motive varies directly with its primitiveness (punishment, food, 
i sex are more energetic than social motives). 

i 3. The degree of unity of motivating forces in any situation varies inversely with the 
oe intelligence of the motivated organism. 

im 4. The effectiveness of a given motive in any situation varies directly with the number 
A of coéperating motives or facilitating factors, and inversely with the number of 

| competing motives or inhibiting factors. 


Davis and Ballard (25) present a review and a tabulation of various incentives used 
of in the school situation. They divide these incentives into three classes: intellectual, 
id emotional, and social. The intellectual incentives refer to devices such as informing 
: the pupil of success or failure; the emotional incentives include encouragement or 

discouragement; and the social incentives take in the devices of the pupil working in 

a social situation. It is in the second classification that we are primarily interested 
i here. The bibliography contains eight items of which the last, for our purposes at 
least, properly belongs under the classification of punishment and rewards. The 
other studies have been mentioned before. The authors conclude: that praise is better 
than reproof; that both may be effective, but praise is better; that too much praise 
will defeat its purpose (a point that has not been brought out previously); that 
individual differences in pupils and teachers will govern the extent to which praise or 
reproof may be used; that some comment is better than none at all; that praise is 
better for younger children, and for the dull; boys are more influenced by reproof, 
girls by praise; positive incentives are better for all ages, grades, and intelligence than 
negative incentives. 

Irwin (46) reviewed the field of motivation with a list of 86 studies. He stated the 
results are conflicting, but there appears a great deal of evidence to show that the 
effect of rewards are beneficial to learning, the degree of benefit varying with the 
type and the amount of rewards given. The effects of punishment seem to be more 
doubtful. 
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réle. No doubt, some of the conflicting or indecisive results have been due, in part 
at least, to oversight of this problem. 

Judd (47), working with the projection of a straight line without knowledge of 
results, has shown that there is no improvement if this factor is removed. Arps (6), 
working with three subjects with the ergograph, found that partial knowledge of 
results brought about a decrement in performance. Later (7), he found that the 
absolute amount of work done and rate were exceeded under conditions of knowledge 
of results. Similarly, Elwell and Grindley (29), having subjects attempt to hit a 
target with a spot of light by manipulating levers with both hands, found improve- 
ment when results could be observed, no improvement or loss when target was hidden 
from view. However, Spencer (70), repeating Judd’s experiment, with four subjects, 
found that there was improvement with three ot the four subjects. 

Unequivocal answers to the problem of Praise vs. Blame (or similar incentives) 
have been beclouded also because of losing sight of the fact of other motivation 
possibly acting. There is a social aspect, as well as the emotional and intellectual, 
that may be exerting its influence. For example, it has been shown by Triplett (77), 
Arps (5), and others that group competition is more effective than competition with 
oneself. However, Maller and Zubin (60), employing a “very strong incentive of 
rivalry,” failed to find this true. The ages of the subjects were not comparable in 
these studies, nor the experimental media, and Maller and Zubin’s criteria were some- 
what different. They did find, however, that more items were attempted, but accuracy 
was decreased. 

It is curious to note the widespread prevalence of the idea of the effectiveness of 
employing praise rather than.blame. Yet there is to be found no unequivocal evidence 
to support this stand. In everyday life sentimental behavior virtually makes manda- 
tory the “pat on the back,” and abjures the otherwise; but it is surprising when one 
encounters this notion in some scientific writings, or in writings from the pens of 
persons with scientific training. By way of example, one reads (24): 

. . . there appear to be three main sources of motivation among children.(1) They 
respond better to praise than to punishment; . . . 

. a child cannot be forced into solving an arithmetic problem by being punished 
for not solving it. You must realize that punishment disorganizes a child’s ability 
to think or work. . . . Just after they have been punished they are less integrated 
than usual and less able to concentrate. Punishment is therefore never an appropriate 
form of motivation. 

Is a child any the more integrated when excited with pleasure? 

Countless other citations could be made; but the point is, or appears to be, that in 
writing a chapter, or in drawing a conclusion, some investigators have not penetrated 
deeply enough. Briggs (19), in a two-page article, reports a study employing the 
Laird technique of subjective opinions, with “considerably more than 300 graduate 
students” in a summer course on “The Improvement of Instruction in Secondary 
Schools” at Teacher’s College, Columbia. Without preparation or discussion before- 
hand the students were asked to report on a twenty-one item questionnaire (Laird’s 
tist plus “best liked” and “least liked” teacher) as to whether they worked better, 
worse, or the same in high school. He finds his results in general agreement with 
Laird’s, and from this concludes that the evidence is convincing that commendation, 
praise, and encouragement are superior to censure, ridicule, threats, and punishment. 
He reports further in the article a study in the Speyer Experimental Junior High 
School in which two teachers known tobe severe gave a pre-test, alternated praise and 
reproof, and gave a final test. Eighty-seven per cent of the pupils did better under 
praise than under reproof. No mention was made of the number of subjects, except 
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Trow (78) reviewed the studies of Wolf (81), Ryans (66), Benton (8), Abel (1), 
Anderson (4), and Sears (68) in the field of motivation which pertains here. 

Young (84, pp. 388-416) summarized the work on motivation, but cites only four 
studies on praise and blame. He suggests the need exists to study the effects of praise 
and blame in relation to self-evaluation; i.c., to the level of self-esteem. 

Murphy, Murphy and Newcomb (61) in their chapter on “Aggression and Com- 
petition” consider the problem of praise and reproof as a question of the relationship 
between the individual personality and the situation. In a series of tables they list and 
digest in several divisions a large number of studies concerned with incentives. These 
tables contain the author, date, subjects, methods, and findings. Relative to praise 
and reproof, and under this division, the authors review the studies of Brenner (18), 
Briggs (19, 20), Gates and Rissland (35), Gilchrist (36), Hurlock (41, 42), Laird (50), 
51, 52), Warden and Cohen (79), and Wood (82). Under “studies of competition” 
they list Leuba (57), whose study involved some use of praise. In “knowledge of 
results” there appear Book and Norvell (17), employing some reproof, and Chapman 
and Feder (21). Under “multi-incentive studies” there are reviewed Anderson (3), 
Anderson and Smith (4), Benton (8), Chase (22), Kirby (48), and Sullivan (71). 
The authors call attention to the conflicting evidence of the studies they have sum- 
marized and attribute this confusion to a failure to have considered the individual 
in relation to the cultural background, and his own peculiar personality make-up. 

Bird (11) presents a clinical approach with some numerical enhancing on the 
analysis of factors influencing learning in 100 cases of healthy young children with 
age range of four to six years. Only average MA cases were studied, the others being 
rejected. The MA range is stated as approximating a normal curve. MA was 
measured by Stanford-Binet and the Rhode Island Intelligence Examination. The 
type of learning employed was the kinesthetic part (finger tracing of letters, prepara- 
tory to writing on board) of the stages in learning to read by the Henry Barnard 
School method. The measure was the number of errors or attempts to learn a new 
letter. 

Of the 100 children, 30 were found with habitual personality handicaps that inter- 
fered with learning; 37 showed unmistakable affective disturbances, but to a lesser 
degree than the others. These were such traits as shyness, tantrums, introversion, 
bullying, etc. 

In speaking of the “30-group,” the author says of the shy, lacking-in-confidence, 
dislike-of-scrutiny, fear-of-task individual that close on the heels of this obstacle in 
learning there was excessive dependence on commendation. Sometimes this was 
sought to satisfy the ego, or for encouragement for further effort. Of these dependent 
individuals, eight were found definitely delayed in their progress whenever praise was 
not forthcoming. , 

Four children desired to win distinction by unusual behavior. To these, admonition 
was more satisfying than indifference. 

Adams (2) presents a philosophical essay on praise. He considers it a necessary and 
desirable weapon in the hands of the teacher early in the school-life of the child, but 
feels that with maturation this must be eliminated gradually. 

Forlano and Axelrod (31) made a study of 131 pupils in four classes of the fifth 
grade in a New York public school. These were divided into three groups—control, 
praise, and blame, and the latter two into two groups each, introvert and extrovert. 
The Woodworth-Wells Number Cancellation Test was the testing vehicle. Three 
weeks elapsed between tests in which a thirty second practice period was allowed, 
followed by a two minute test period. The pupils were called to the desk, a mark of 
P or G placed upon their papers according to a prearranged plan, then they were 
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asked to return to their seats leaving their papers face down on the desk. Announce. 
ment was made that those who received P had done poorly, while those who had 
received G had done well. The next test was given at once. The control group 
followed the general procedure but without the incentives being administered. The 
three groups were equated on the basis of initial scores, CA being disregarded since 
the correlation was only —.o3. All groups gained in succeeding trials (2) over the 
first. There were significant differences in the second trial for the IB over the control 
group and of IB over IP. On the third trial differences were again significant for IB 
over C, EB over C, and EB over EP. On this trial the standard ratios had increased 
for the praise group. They concluded that blame was a more effective incentive than 
praise, and speculated on the increase of the standard ratios for the praise group. 
Would this have continued to significance had more trials been employed? 

Blissenbach (14) presents an unfactual discourse closely approaching complete senti- 
mentality on the value of praise. It is worthless as far as providing any scientific data 
on the question. 

Blankenship and Humes (13) attempted to determine the effect of praise and 
reproof on memory span performance and memory span reliability. Their subjects 
were female college undergraduates divided into three groups: control, 44; praise, 43; 
and reproof, 43. The average age was about twenty years. Statements of praise or 
reproof were made verbally. The test media were two sets of digits varying from 
5 to 13 numbers, which were read to the subjects at a rate of slightly less than one 
per second. Each group was tested twice, a different series being used the second 
time, after a week's interval. They concluded that the effects of praise or reproof 
were negligible, as far as memory span is concerned, the differences between gains of 
each group being Statistically not significant. The approved group showed signifi- 
cantly more reliable memory span performance (0.88) than the reproved (0.41), or 


the control (0.47). Differences in reliability coefficients between last two groups 

is not significant. There was a low correlation coefficient in each group (from .07 

to .12), indicating no relationship between memory span gain and age in each group. 
We gather from a review of the literature that: 


. The effects of praise being a greater incentive than blame, or vice versa, is still an 
unsettled question. 

. The later studies tend to show little difference between the effectiveness of these 
incentives, and if leaning in any direction, do so towards blame. 

. Both praise and blame give increased performance on the second and third trials. 
Additional trials will produce one of two results, dependent upon the study chosen: 
either a continuing increment or a decrement in performance. 

4. It appears that both praise and blame show only slightly increased performances in 
the second and third trials over any other change that might be introduced into 
the learning situation. 

5. Really adequate comparisons cannot be made between studies, because: 

a. The variables of IQ, CA, MA, grade, initial ability, intensity of motivation (or 
type), and Emotional Aspect, have not been equated for, or taken account of, 
in many studies, or are on entirely differing planes. 

b. Techniques, approaches to the question, test media, vary considerably between 
experiments. 

c. Often the statistical treatment of the data has been unreliable or inadequate. 

d. Generally, a lack of any common ground upon which to make comparisons. 

6. The «ctual motivating factor has been accepted too casually (with the possible 
single exception of Brenner): 

a. Equal statements of praise or reproof are not necessarily of equal weight in 
effectiveness; what might act as blame for one individual (donor or recipient) 
might act as praise for another. 
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b. The method of administering the incentive will affect its potency, and has 
varied considerably from experiment to experiment; from a simple statement 
of praise or reproof to a quite lengthy statement, and from the giving of it 
to the individual alone to the giving of it to the group or to an individual 
before a group. 

7. The total situation, including the physiological, has been given little attention. 

8. In none of the studies has the factor of the influence of the experimenter’s per- 
sonality been taken into account. 

9. And, finally, there has been no distinction made between learning and the ends 

employed to seek its measurement. 


Methods 


The preceding review of the literature leaves no definite’ conclusions to 
be drawn as to the effectiveness of praise and blame as incentives to 
learning. This has been brought about, it is felt, through there being no 
clear-cut conception of the basic principles involved, or through a non- 
realization or non-differentiation of what was being sought and what was 
actually being done. It is true that performance has been mentioned 
frequently in the literature cited, but this has been done casually and 
interchangeably with learning. 

Quite irrespective of the “system” of psychology adhered to, the fact 
seems patent that very little is known of the actual processes involved in 
the matter of learning. Experimental work (this study included) which 
has attempted to solve the problem of learning or of motivation in learning 
has had to resort to a measuring of output on the part of the subjects. 
What forces, or factors, were operating at the time of acquisition and at 
the time of reproduction could not be discerned. The question hinges 
upon two factors: the amount of learning previously acquired; and the 
forces operating at the time of performance. 

The crucial point in this experiment comes from a comparison of the 
improvement of each group from the initial test, S;, to the final test, S, 
(these gains are shown in Table VIII, p. 38).* 

In the case of S;, it is reasonably certain that the urge to exert oneself 
was comparable in all the groups. In S;, no specific incentives were applied; 
the subjects were simply requested to do the test. In S,, all the groups 
were administered the same incentive. Now, if the energy output on S, 
were comparable in the groups, it could be said that the group making 
the highest score had reached a higher degree of attainment, and that the 
group improving the most from S, to S, had learned the most. 

*S with subscript refers to the stimulus-test situation. Thus S; refers to the initial test; 
S: to the first test immediately following this; S: to the next; then, Ss; and Se, the final test, 
the one where the common incentive is given. In addition, in the College group, Sp indicates 


a preliminary period to the main testing; and Sr, a final after-test period. The order was thus: 
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Any such hypotheses as above could not be ventured were the results 
to be taken from S,, So, or Sg, the intermediate learning tests. On any of 
these it cannot be certain whether a high-ranking group on one of these 


tests has actually /earned more or has merely expended more energy because 
of the incentive. 


Discussion 


Let this question be aired more fully. It has been observed that virtually 
nothing but confusion exists in so far as a definite answer is concerned 
relative to the effects of praise or blame as incentives to learning. In one 
study the evidence points to blame as being the more effective; in another 
study the data all favor praise. How does this come about? The answer 
lies in the design of the experiments. A test is given. This is followed 
by another test, or perhaps several, under incentive conditions (praise, 
blame, or no comment). Then curves are plotted and data assembled. 
From these, conclusions favoring one incentive or another are drawn. 
In the one case, there would be found evidence indicating praise to be the 
most effective; in the other, just as cogent evidence in favor of blame. But 
such experiments have been superficial, have not penetrated deeply enough; 
nor have they been set up to really answer the question: “Is praise or is 
blame the more effective incentive to learning?” What, then, have prior 
experiments in the human field actually been concerned with? Output, 
performance, expenditure of energy, and the like. Were the control 
groups learning any the less than the other group? The data do not present 
any evidence adequately to answer this; for these data have been examining 
performance and not learning. What would the control groups have done 
had they been stirred into action, had they been motivated into performing 
with an equal amount of effort as the other groups? The answer is not 
to be found in the data in the human field. 

In the animal field, Blodgett (15) * has thrown some light on the problem. 
His learning curves show almost a straight line flatness for the rats running 
the maze without the food reward; that is, there was little decrease in 
errors or time scores. On the contrary, the curves of the rats given the 
reward at the end of each run fell off sharply. After varying periods of 
time (3 days to 7 days), food reward was introduced to the previously 
non-rewarded animals. When this was done, the error and time curves 
fell off precipitously, the improvement (drop in mean error) exceeding in 
some instances the best improvement on any single day of the always- 


2 Op. cit., pp. 120-121. 
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rewarded group. It could be nothing but a stretching of the imagination 
to attribute the tremendous gain of the latterly rewarded rats to a sudden 
accumulation of learning. Clearly, prior to the reward these animals had 
been learning the maze, as Blodgett points out, but had been without any 
pressing motivation to hurry to the exit. 
Now, look at two of Blodgett’s curves.* 


Lrror 
> 


Had the experiment terminated at the points x, there could have been but 
one implication to follow: rats rewarded at the end of running a maze 
learned more than those not rewarded. Such analysis or scrutiny would 
be in keeping with the majority of past experiments in human motivation. 
But following along with Blodgett, it is to be observed that conclusions 
based on such scrutiny are insufficiently founded. Thus, learning is not 
measured as a thing in itself, but as amount of output. Further analysis 
of the Blodgett curves reveals a quite different solution where all groups 
have the same incentive to produce. After the introduction of a common 
incentive, the previously unrewarded rats not only reduced their error and 
time scores pronouncedly, but in some instances improved more than the 
always-rewarded rats. 

Logically, then, if the measure of learning must be in performance, then 
there must be comparable opportunities to perform for both control and 
experimental groups. Remove the equal opportunity for groups to want to 
produce, there must follow an entirely different measure of learning if, 
indeed, it is a genuine measure. At any rate, learning as evidenced through 


[bid., Figs. 5 and 6. 
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performance is only partially and incompletely measured. Even a cursory 
glance at the curves of the present experiment (Figures 3 to 13, pp. 11-21) 
will show the great number of possible conclusions that might be drawn 
were these to be based on the results only of the first, second, or third tests 
after the initial test. 

However, the gains from test to test under the influence of the different 
incentives are of considerable interest; and it seems important to show 
performance on the various tests in relation to performance under the 
common incentive. The scores at all successive points have been given 
in Tables V and VI, and have been graphed in Figures 3 to 13. In 
Chapter III these curves will be briefly discussed, after which the more 
crucial comparison of the S,-S; gains will be presented. 


{ 
: 


| 
; ‘ 
1 
i| 
a 4 
1 


Tue Errects or Praise as INceNTiIves TO LEARNING 


q 
4 
i 


Legend: 
Praise 
—-—Blame 
— Control 
120 } 
100 
& 80 
80 
8 
40 
20 
0 


Fic. 3. Boys anp Girts ComBinep. Tester A. Grape VII 


Sa 
2 
: 
3 
i 
= 
= 


i 


82 


Fic. 4. Boys Girts Compinep. Tester A. GRADE Vill 


x 
12 Hermann O. 
| 
120 
1 > 
4 
| 
60 
H / 
40 
i 
Ay 
20 
Test 


3 


: = 
Z 
é 


S 
(dnoiZ jo x) UO 2109g 


120 

100 

| 

40 
20 4 

5, 


Boys anp Girts ComsBinep. Tester B. Grape VII 


6. 


(nos8 yo x) 21, uo 21025 


14 Hermann O. Scumipt 
| 
3 
off 
4) 
i? 
id 
4 
Test Sr S) Sp Ss 
4 
iy 


15 


Fic. 7. Boys anp Girts ComsBinep. Tester B. Grave VIII 


(dnoiZ jo x) UO 3109g 


120 3 
100 3 
seo 

a 

= 

/ 

4 ' 

> : 
/ ~ 


Aut Boys. Tester B. HicH ScHoor 


: 


Fic. 8. 


(dnosB yo x) “82, wo 21095 


é 
120 
a i 
100 
¥ 
4 4 
4 
80 
3 
4 
4 
4 
a 
20 
: 
es ce 
7 
bs 
\ 


Tue Errects of Praise AND BLAME as INCENTIVES TO LEARNING 


group) 


Ss 
Fic. 9. Boys. Tester A. Grave VII-VIII 


Legend: 
~--Praise 
——Blame 
Control 
120 
A 
100 
of 
. 
80 
20 


Grats. Tester A. Grape VII-VIII 


Fic. 10. 


(dnoiZ jo x) UO 109g 


j 
18 
120 
4 
{ 
A 
i 
a 
4 i 
Y7 
40 
Z 
4 
| 20 
; 
es 1 2 
4 
j 


Tue Errects oF Praise Brame as INcENTIVes TO LEARNING 19 


of group) 


3 


Sy Se Ss 


Fic. 11. Boys. Tester B. Graver VII—-VIII 


4 
120 
l 
Gg 
a 
60 
. 
bes 
. 
/ 
20 
Teste | 
4 
4 


Grats. Tester B. Grape VII-VIII 


Fic. 12. 


8 
(dnos3 jo x) UO 2109g 


20 
120 
1 
4 
40 
‘ 
»? 
4 
3 
¢ 


Tue Errecrs oF Praise AND BiaMe as INcENTIVEs TO LEARNING 


Sy Se 85 
Fic. 13. Att Men. Tester A. COLLEGE 


4 
120 
1 is 
af o” 
“7 
| 
7 
60 / 
4 
40 4 
he 
20 
Test 
8. 


CHAPTER II 
Tue Set-Up or tHe ExperiIMENT 
The Groups 


“TH groups employed as subjects in this experiment were comprised of 

the following: 85 boys and 81 girls from six 7th grade classes, and 
84 boys and 102 girls from six 8th grade classes, all from the Woodland 
Way Junior High School in Hagerstown, Maryland (population, approxi- 
mately 35,000); 192 boys from six high school classes at McDonogh School, 
McDonogh, Maryland (a private boys’ school near Baltimore); and a 
college group of 30 men. 

The high school groups were made up from unselected students from 
all four high school years. They were unselected in that no regular 
scholastic or intellectual qualifications determined their placement into 
groups. They were members of various activity groups: some were from 
a tap-dancing class; others belonged to a chorus; still others were from 
the orchestra, et cetera. The groups represented, as far as could be 
ascertained, a true sampling of the Upper School of this institution. 

The pupils in the lower grades were placed in the classes on a hetero- 
geneous system, and could be taken as representative of the population of 
that school and general locality. 

The men in the college group were, with three exceptions, members of 
the first year class at The Johns Hopkins University. In the three excep- 
tions, one was a senior student, one a junior, and one a sophomore. All 
were men who had volunteered for the experiment. 

The division into Praise, Blame and Control groups was done according 
to the following scheme: for the lower years and high school, the first 
class tested became the Praise group; the next class became the Blame 
group; and the last class made up the Control group. In the colle,_ yroup 
the men were arranged first according to intelligence test scares Otis (63); 
then into clusters of three, the first of this cluster becoming a Praise subject, 
the next a Blame subject, and the third, the Control subject. 

The 7th and 8th grade groups and the high school groups were adminis- 
tered the tests as a group. The members of the college group were tested 
individually. In addition, a continuous kymogram was made on this latter 
group for pulse and respiratory rates, and for blood pressure. 
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TABLE I 


SIMILARITY OF THE Groups wiTH Respect to CA, MA, ano IniTiat Score. Tester A 


CA 1n MontTHs MA MonTus INITIAL ScoRE 


RANGE x RANGE Rance | x 


141-181 +5 | 134-178 
133-188 9 | 126-183 


142-177 134-174 


153-191 .2 | 125-200 
144-188 6 | 117-203 


143-180 .9 | 133-201 


141-181 125-193 
133-188 4 | 117-203 


140-180 +5 | 134-201 


141-191 | 142-200 
132-188 2 | 127-184 


142-178 .2 | 133-196 


175-215 | 160-242 
169-220 146-263 


179-228 | 140-232 


204-238 -4 | 175-230 
204-260 178-234 


203-258 .0 | 193-230 


4 

| 4 

GRADE N 8 4 

Praise 26 155.2 | 10.3 | 0-47 18.9 | 14.9 
Blame 29 149.7 | 15.8 | o-41 | 20.4 | 11.8 

Control 30 148.1 | 9.5 | 8-40 | 21.6] 8.9 

vm if 
Praise 31 163.7 | 16.2 | 5-43 | 25.6 | 11.2 { 7 
Blame 30 163.3 | 17.9 | 5-56 | 266] 4.5 

Control 33 162.3 | 16.0 | 7-64 | 33.3 | 14.9 F 4 

Boys: 

Praise 29 158.2 | 15.7 | 0-45 | 21.2 | 14.8 a 
{ 

Blame 29 155.7 | 20.9 | o-41 | 19.3 | 12.1 a 
Control 28 154.6 | 16.3 | 9-64 | 27.6 | 12.2 ) 4 

Praise 28 159.5 | 14.0 | 5-47 | 23.5 | 11.9 

4 

Blame 30 157-9 | 13-9 | 9-56 | 27.6 | 12.4 = 4 

Control 35 156.3 | 12.4 7-52 | 27.6 | 15.1 : 3 

Praise 14 202.4 | 25.0 5-52 | 32.9 | 11.2 4 _ 
4 

Blame 41 196.2 | 26.2 | 4-57 | 32.9 | 11.2 q a 

Control 23 184.4 | 21.1 6-65 | 33.0 | 19.1 - if Z 

COLLEGE 
Praise 10 209.0 | 16.8 | 18-49 | 35.4 | 10.2 f 4 
Blame 10 211.2 | 17.3 | 28-50 | 38.9 | 14.1 } 8 
Control 10 210.9 | 12.5 | 27-69 | 37.6 | 12.2 4 4 
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TABLE II 


oF THE Groups wiTH Respect to CA, MA, Score. Tester B 


CA 1n InrT1AL Score 


RANGE RANGE x 


130-188 126-176 


137-181 126-183 


139-178 . 133-178 


152-196 131-197 
146-185 130-190 


153-191 137-204 


140-196 126-185 


139-185 130-183 


143-186 125-204 


130-196 135-197 


137-183 126-190 


139-191 8 | 137-190 


177-230 144-256 


177-224 +5 | 148-227 


168-223 157-241 


ri 
vu 
Praise 29 147.1 | 12.7 3-63 | 30.8 | 13.9 
Blame 30 154.2 | 13.2 7-55 | 34-3 | 12.2 
Control 22 155.3 | 11.8 | 3-55 | 32.1 | 11.9 
Praise 31 159.4 | 15.3 | 22-68 | 40.5 | 10.9 
; Blame 28 159.9 | 16.8 | 5-61 | 29.3 | 143 
' Control 33 by eva 160.6 | 17.8 | 18-88 | 44.4 | 14.2 
! 
VII-VIII 
Boys: 
dap ae Praise 31 150.6 | 14.4 | 6-62 | 32.9 | 10.8 
| Blame 26 154.9 | 14.8 | 7-50 | 29.7 | 11.5 
: ahh Control 26 154.8 | 16.7 | 18-88 | 37.9 | 14.8 
cite 
Praise 29 156.5 | 14.7 | 3-68 | 38.8 | 15.2 
} Blame 32 158.4 | 15.6 | 5-61 | 33.6 | 14.6 
ore Control 29 161.6 | 15.2 | 3-67 | 40.8 | 14.3 
Hs 
Praise 41 197.9 | 25.3 | 13-104] 44.6 | 16.8 
Blame 30 192.1 | 22.0 | 24-71 | 49.9 | 11.9 
: Control 43 197.8 | 12.4 | 14-69 | 41.7 | 6.7 
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All subjects received a code number, so that no names appeared on any 
of the test sheets. 

The total number of subjects used was 574. Tables I and II show the 
divisions and similarities of the groups for chronological age, mental age, 
and initial score. Mental ages were computed from the intelligence quo- 
tients secured from the school, and in the case of the college group, from 
the results of the intelligence test given by the writer. Where the chrono- 
logical age exceeded thirteen years, zero months, its multiplier was 
determined after Terman and Merrill’s system (73). 


The Selection of the Material 


The test material was devised by the writer. It consists of a number of 
characters, six, from Gates’ Test of Associative Learning (34),’ arranged 
on a page. Figure 14 presents the key and first two lines (of 14) of the 


7 2 6 


Fic. 14. Key anp Lines 1 AND 2 oF Test EMPLOYED 


test. It is a code substitution test, the code appearing at the top of each 
sheet. The characters were first traced on a stencil. These were then 
mimeographed, cut out, and pasted on a large sheet of paper, the scaling 
being approximately 2:1 of the ordinary 814 x 11-inch size paper. This large 
sheet was then reduced photographically: a new stencil tracing was made 
of this, and the requisite number of tests then were mimeographed. 

The total number of characters that can be coded is 154. This is not quite 
sufficient number to give reasonable assuredness that no one will complete 
the test before the allotted time of two minutes. On several occasions it 
appeared that some subject would finish ahead of time on the last test. In 
these cases an additional sheet was given them just prior to the last test. 
Later analysis showed that this move had been well taken in several 
instances: one subject coded 157 characters, and another, 161.” 

1 The writer would like to take this opportunity to express his appreciation to Professor 
Gates for his kind permission to make use of this material. 

? The original test contains 165 characters. In the tests used in this experiment one column 


was omitted because of the necessity of leaving a margin for the mimeograph machine. It 
had been planned to have the original test plated and printed, but the cost appeared prohibitive. 
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Each subject received a packet of five tests, each numbered serially 
and with subscripts in the upper right hand corner of the page. This 
would identify a subject and a particular test. Thus, a packet was num- 
bered, for example, 1201,, 1201,, 12012, 12013, and 1201,. The five tests 
were clipped together with a thin staple, and had an outside yellow “ second- 
sheet ” upon which also appeared the proper number and upon which the 
subject wrote his name at the termination of the test. 


Procedure 


Upon entering the room, the tester requested that all desks be cleared 
and then gave out the test packets. The subjects were cautioned before- 
hand, and this was seen to, that no one was to examine the packet nor in 
any other way get a “running start” on any of the others. 

When all was in readiness the following instructions were read to the 
subjects: 

I must ask you to please give to what you are about to do your most 
whole-hearted attention. Some of this may appear as silly, or monotonous, 
to some of you. Try not to have any personal views on the matter. Do not 
worry about it if you cannot do all or many: practically no one can do 
it all. It is important that all instructions be carried out promptly and to 
the letter. Do not begin, or do anything, until I say “Go”; stop at once, 
even if you are in the middle of a stroke, when I say “Stop.” Do not turn 
any pages until told to do so. 

This is a code, or substitution, test. The object is to see how many of 
the numbers you can place correctly underneath the proper symbols. The 
key is at the top of each sheet. Work from left to right in each row. When 
I give the signal to begin, turn the first sheet (the topmost sheet as the 
papers are in front of you) and start working at once. No questions will 
be answered after the test begins. (Pause.) Everybody ready? Begin. 


The examiner walked about while the subjects were working, observing 
their work and making certain that all were on the correct page. At the 
end of exactly two minutes, the examiner said: 

Stop. Everybody stop. Tear off the sheet upon which you have just 
been working and pass the sheets forward quickly. 

Care and insistence had to be exercised at this point in order to prevent 
couating-up of scores on the part of the subjects, or discussion with their 
neighbors. 

After the papers had been collected the tester made an apparent intensive 


examination of the papers for one minute, keeping alert, nevertheless, to 
prevent talking, or the turning over to the next sheet. 
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At the end of the minute interval, the examiner said (depending upon 
the incentive to be employed): “That was very good (poor),” or made no 
comment. “Let’s try this again. Begin when I say ‘Go’; stop when I 
say ‘Stop.’ Ready, begin.” The procedure of walking about, collecting 
and checking papers, etc., was repeated. Before each of S, and S; the 
tester said: “ You are doing very well (poorly),” or made no comment. 
“Let’s try this again. Ready, begin.” 

As the preface to S,, the tester announced the following: “You have 
done very well (poorly),” or made no comment. “This is the last test. 
Now, I shall give one dollar to the one showing the greatest improvement. 
Ready, begin.” At the completion of this test, the papers were passed 
forward as before; then the pupils were asked to write their names on the 
yellow facing sheets, which were collected. 

Each separate test period was exactly two minutes in duration, and each 
interval between tests one minute. Timing was done with stop watches 
which were checked with each other and found to have no detectable 
differences. 

There were two testers for each grade, except the college group. The 
two testers worked independently of each other, but at the same time 
and at the same school. These testers will be known as Tester A and 
Tester B. Tester A was the writer; Tester B was a female graduate student 
in Education at Johns Hopkins, with some training in test procedure, and 
who had been thoroughly instructed and rehearsed by the writer. Each 
tester carried a set of directions and procedure into each class and followed 
these implicitly. 

As a check against confusion in spite of coding, each separate test was 
immediately bound with an elastic band after the one minute examination 
period and labeled. The completed tests of each class were kept in separate 
manila foldérs which were marked with the class designation, the incentive 
employed, the place, and the tester. 


The College Group 


The college group was tested by Tester A only. These were all individual 
tests. Before the test proper began, each subject was questioned as to the 
amount of exertion undergone, or as to the amount and time of food taken, 
immediately prior to appearing for testing. Nothing was done in the case 
of recent partaking of food (there were three cases); but in the case of 
exertion, a reasonable time was allowed for recovery. After this, the 
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apparatus for recording continuous blood pressure, pulse, and breathing 
rates was adjusted and set into operation. There was a preliminary (S,) 
period of two minutes duration with each subject, during which they 
performed some simple task, such as Gates’* Same-Different Digits or 
Same-Different Figures. After an interval of one minute, the test proper 
began and followed the same procedure as described earlier. At the end 
of the test proper there was a two-minute “final” phase (Sp), during 
which the subject rested and engaged in small talk with the writer. 

Care was observed that the conservational topics were quite banal and 
that, in so far as could be judged, they were non-stimulating. The sole 
purpose of the conversation was to avoid ennui on the part of an otherwise 
unoccupied subject. 

Each subject had been advised before the beginning of the experiment 
that there would be no “tricks,” or shocks, or fright-situations; and they 
were adjured to remain as motionless as possible during the running of 
the experiment. They were advised, too, that possibly unusual sensations 
of tingling or coldness might be felt in the arm or hand employed with 
the sphygmomanometer cuff, but were asked to disregard these. In no 
case did a subject report annoyance or discomfort from the apparatus. In 
fact, all stated that they were unaware of any sensation in the arm or hand, 
or chest, until after termination of the experiment. The amount of dis- 
comfort they reported as being of negligible degree. 

All were made as comfortable as possible before beginning the experi- 
ment. Each group reported at approximately the same time of day. With 
four exceptions, the members of a cluster followed each other at about 
twenty-four hour intervals. In the case of the four exceptions, there was a 
ten-day lapse. These men had not reported when first scheduled because 
of conflicts in classes, and reported after the Christmas holiday. 


The Apparatus 


The apparatus employed with the college subjects was standard laboratory 
equipment for measuring the physiological changes of blood pressure, pulse, 
and breathing rate. The subject was in every instance screened from the 
apparatus. The apparatus consisted of a recording drum, with extension, 
revolving at the rate of 3.25 inches per minute; a sphygmomanometer of 
the common desk type, with a Schrader air release valve between the top 
of the manometer and tambour cut-off valve, tambour and recording lever; 


3 Op. cit. 
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Sumner-type pneumograph, with tambour and recording lever; signal 
marker, operated from a single dry cell in series with a Jacquet chronometer 
registering at one second intervals, and a manually operated signal marker, 
operating off the same dry cell. 

Thoracic breathing curves were obtained, the rubber tubing being placed 
in each case. across the sterno-costal angle anteriorly, and the chain passing 
just below the inferior margins of the scapulae posteriorly. Tension was 
approximately the same for each subject. In addition, in order to further 
insure comfort to the subject, the plates of the pneumograph were placed 
at the mid-axillary line. 

Since the average time each subject spent in the apparatus was approxi- 
mately seventeen and one-half minutes, no large amount of pressure could 
be placed against the circulation. Preliminary experimentation had shown 
that at about eighty millimeters pressure of mercury, the optimal comfort 
to the subject and identifiable pulse readings could be obtained. This is, 
of course, neither a systolic nor a diastolic pressure; but variation in blood 


pressure, whatever it may be, can be obtained, as well as pulse beats, with 
this arrangement. 


Scoring 


All scoring was done by the writer. No objective attempt was made to 


determine whether or not the subjects’ responses were correct. His score 
was the number of substitutions attempted. This was considered a legiti- 
mate procedure since the product-moment correlation, based on a sampling 
of 100 papers, between number of items attempted and number of correct 
responses, was +.944. 

In the case of pulse and breathing rate, only the rates during a period 
thirty seconds before the end of a test and thirty seconds after its beginning 
were used. The interval between tests was used in its entirety. With the 
breathing rate the I-factor, or I/E ratio, was computed; while for pulse, 
an average rate per minute was computed from the rates of the sample 


periods. 
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CHAPTER III 
Tue Resu tts * 


General Treatment 


HROUGHOUT the discussion, and in the various tables, the notations 
x and s will be used respectively as estimates of m, the mean of the 
population, and ¢, its standard deviation. Where N was small, as it was 


in most of the cases, « was best estimated from the following expression 
(Goulden, 37): 


_ %(x-x)? 
N—1 

The difference in score of the last test over the first test, or gain, was 
employed as the measure of learning. The Pearson Product-Moment cor- 
relation of Gains vs. Final Score was in every instance positive and high 
(Tables III and IV). : 

Before the significance of gains and/or their relatedness could be 
analyzed, there were several variable factors, or possible influencing ele- 
ments, that demanded a priori consideration. These were: chronological 
age, mental age, initial score, and objective features pertaining to the two 
testers. There conceivably might be added the matter of Jocale, or general 
setting in which the testing was conducted. 

What could be done about eliminating these influences, or about holding 
them constant, presented a real although no novel problem. The question 
of equating one’s subjects or groups is one with which every experimenter 
is faced in a problem of this sort. Actual physical equating of subjects 
by elimination presents several disadvantages; one of these is that often 
the numbers with which one is left have been reduced to a point beyond 
which even small sampling technique would prove untrustworthy; another, 
and perhaps even more vital point, is the somewhat intangible one of just 


what happens to data, or just what sort of data are left, if cases have been 
extracted from them. 


* The scatter-diagram worksheets for the correlations, and original data sheets, are not 
presented but are on file at The Johns Hopkins University. 
30 
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TABLE III 


Pearson Propuct-MoMENT Correvations (INcENTIVE Groups AND SEXES Comsinep). 
Grapes VII, VIII, H.S., anp CoLiEce 


Groups 


Gains vs. CA 
Gains vs. MA 
Gains vs. IS 
Gains vs. FS 
IS_ vs. FS 
MA vs. IS 
MA vs. FS 
CA vs. IS —.162 


CA vs. -.112 -017 


MA vs. —.298 —. 406 


As to the matter of locale, it was not considered necessary or feasible to 
attempt corrections for S.E.S., or similar influences, since our groups were 
drawn at random and were representative of the population from which 
drawn. It is conceivably true, of course, that there may be differences 
between major groups. 

The extrinsic or objective influence of each tester was held constant in 
so far as was possible to contemplate beforehand; e.g., each tester wore his 
or her same outer garments at each testing; reading of directions was 
rehearsed, so that no voice or other cues would be present in one situation 
and not in another; and procedure was standardized as given in Chapter II. 

In this study the difficulties usually encountered in equating groups have 
been overcome through the employment of a statistical procedure. The 


= 


2 
vil vil HS. CoLLEcE 

| | noe | | | | ess 

—.227 | -.213 | —.336 

.285 .077 166 4 

-~.041 | —.088 .260 

881 858 811 a 

+539 | .428 -499 

.198 .207 

-.243 | -.139 | —.205 

-.360 | —.226 —.199 
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TABLE IV 


Pearson Propuct-MoMentT CorreLations (INcENTIve Groups ComsiNep). 
Graves VII-VIII Comsinep 


A B 
Groups n,86 N,93 n,83 n,go 
B G B G 
Gains vs. CA —.09 —.38 +25 
Gains vs. MA -06 -26 -22 
Gains vs. IS —.12 -.10 -.19 —.23 
MA vs. IS -45 .38 23 -32 
CA vs. IS —.38 .10 -.07 -.13 
MA vs. CA —.28 —.10 —.34 —.23 


number of cases when broken into various groups, although not large, were 
of sufficient size to permit the use of small sampling techniques, and the 
data were left in the original unadulterated state. The variables that were 
accounted for or corrected for by the method adopted were those of 
chronological age, mental age, and initial score. 

However, before the data could be treated, it was first necessary to test 
the linearity and normality of the various distributions. This was done 
by plotting the regressions of Gains vs. MA, Gains vs. CA, etc. These 
curves were found to be sufficiently close to normal and linearity to be 
best fitted by a straight line, and to permit the calculation of multiple 
regression equations (Holzinger, 40). 

Throughout this study where comparisons have been made, or the symbol 
“vs.” employed, it is to be understood that the difference indicated is that 
of the first member minus the second member. Hence, there will appear 
at times negative differences; and, in many cases where variable-correction 
factors have been applied, it will be found that the corrected difference is 
larger than the raw difference. 

In making comparisons, use has been made of the relative deviate, which 
is merely the number of standard deviations that x, a variate, is away from 
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the mean of its series, in either direction. This can be expressed thus 


(Treloar, 76) : 


x-xX 
k 


Sz 
where x-x and s, are dimensions on the x scale. Dividing one by the other | 


produces k, a pure number, the original units of measurement (age, IQ, 
etc.) vanishing. From this one can secure P values from a table of normal 
curve functions, P being the probability of k being exceeded through errors 
of random sampling. 

Expanding the preceding equation to fit the data of this experiment, 
there was used, as typical, this form: 


(Xp — Xe) — Pe) — Zn — Ze) — Wa — We) 


1 1 
— 
Ne 


where & is the relative deviate for a difference in gain of a Blame group 
versus a Control group, the difference being corrected by multiple regres- 
sion correlations for the variables of chronological age (Y), mental age (Z), 
and initial score (W), respectively. The constants employed were found 
from product-moment correlations which are shown in Tables III and IV, 
pages 31 and 32. 

In testing the significance of a difference between two means, the level 
of significance chosen was k=2.00, P=.0455. ‘That is, if P were larger 
than .0455, the hypothesis that the difference was a real difference and not 
due to errors of random sampling was rejected. 


Examination of the Learning Curves 


Word picturization of curves are very apt to be unsatisfactory and poor 
substitutes for the curves themselves. Consequently, they will be dealt with 
rather summarily, the attempt being made only to designate some of the 
most salient features. Tables V and VI present the scores made by the 
various groups on the several tests. Figures 3 through 13 are the graphs 
for these scores. From examination of the curves several things will be 
noted: (1) There appears no markedly age or grade differentiation in the 
general slope of these curves. With advancing status, the initial point is 
higher, but so is the point reached on S,. The curves are flatter for Tester B 
than for Tester A. (2) In no instance does a curve fall below a point 
previously reached. Even where a loss in acceleration in the curve appears, 
the lesser improved score is still above that of the preceding test. This may 
be accounted for, in part at least, to practice effect. That is, even though 
the specific incentive, or some other extrinsic or intrinsic factor, may have 
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TABLE V 


Scores (x and s) on THE Five Tests. Tester A 


Si Si S: Ss 


| 
| 
GRADE x x s x | s | x | s | 
vu 
Praise 26 | 18.9 | 14.9 | 49.9 | 22.3 | 69.9 | 23.1 | 85.2 | 23.4] 97.3 | 24.9 
Blame 29 | 20.4 | 11.8 | 42.3 | 16.7 | 62.5 | 17.5 | 73.9 | 28.6 | 81.3 | 24.3 
Control - | 30 | 21.6] 8.9 | 42.7 | 15.0 | 58.1 | 21.1 | 72.2 | 24.61 79.9 | 25.7 
Praise 31 | 25.6 | 11.2 | 54.7 17.4 | 73.8 | 18.5 | 89.3 | 21.0] 98.3 | 22.6 
Blame 30 | 26.6 | 4.5 | 54.2 | 16.2 | 70.2 | 22.5 | 89.6 | 26.6] 97.7 | 29.: 
Control 33 | 33-3 | 14.9 | 60.9 | 22.3 | 80.4 | 26.4 | 93.3 | 24.6] 97.2 | 27.1 
ViI-VIII 
Boys: 
Praise 29 | 21.2 | 14.8 | 47.8 | 19.6 | 68.6 | 21.3 | 87.2 | 22.9 | 98.5 | 21.9 
Blame 29 | 19.3 | 12.1 | 41.1 | 16.3 | 60.3 | 20.8 | 72.6 | 23.4 | 80.2 | 25.8 
Control 28 | 27.6 | 12.2 | 47.4 | 19.9 | 65.1 | 26.1 | 77.7 | 27.9 | 83.3 | 25.3 
Girls: 
Praise 28 | 23.5 | 11.9 | 55.5 | 16.0 | 76.0 | 18.9 | 87.7 | 21.4 | 97.1 | 25.0 
Blame 30 | 27.6 | 12.4 | 55.0 | 16.4 | 72.3 | 18.6 | 90.8 | 24.2 | 99.2 | 26.9 
Control 35 | 27.6 | 15.1 | 55.8 | 22.2 | 74.2 | 25.5 | 87.4 | 25.6] 93.4 | 29.0 
HS. 
Praise 14 | 32.9 | 11.2 | 60.6 | 16.4 | 78.5 | 21.5 | 91.1 | 23.4 | 104.0 | 27.5 
Blame 41 | 32.9 | 11.2 | 59.9 | 19.1 | 82.9 | 19.9 | 97.8 | 21.4 | 107.4 | 22.9 
Control 23 | 33.0 | 19.1 | 57.8 | 18.8 | 74.2 | 23.7 | 87.9 228 98.9 | 27.3 
COLLEGE 
Praise 10 | 35.4 | 10.2 | 61.1 | 14.2 | 80.4 | 19.1 | 96.8 | 23.2 | 100.1 | 17.5 
Blame 10 | 38.9 | 14.1 | 69.3 | 12.7 | 79.6 | 13.3 | 92.9 | 17.6 | 103.3 | 22.2 
Control 10 | 37.6 | 12.2 | 56.5 | 12.8 | 71.8 | 18.9 | 83.4 | 28.7 | 91.0 | 25.0 
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TABLE VI 


Scores (x and s) on THE Five Tests. Tester B 


Si S: Ss 


32.9 8 | 50.6 
29.7 5 | §2.3 74-7 


37-9 8 | 54.4 78.8 


38.8 2 | 58.4 ‘ 83.3 
33.6 6 | 58.3 85.0 


44.6 8 | 67.0 | 24.9 . Y 04.5 .2 | 106.0 | 27.4 


49.9 | 11.9 | 74.2 | 19.8 | 94.0 5 | 107.8 9 | 120.9 | 21.9 


41.7 | 6.7 | 57.9 | 17.7 | 76.3 ‘ 89.3 i 98.3 | 23.6 


been operating to inhibit further progress, they were not of sufficient potency 
to overcome amount of progress already made. (3) There is a distinct 
tendency for the rate of improvement to fall off between S; and S,. This 


Sc 3 

GRADE x s x x 
Praise 29 | 30.8 | 13.9 | 50.6 | 21.2 | 66.1 | 25.2} 74.0 | 24.9 | 78.1 | 26.6 
Blame 30 | 34.3 | 12.2 | 54.0 | 18.6 | 68.0 | 23.0 | 75.9 | 18.1 83.1 | ‘23.0 4 - a 

Control 22 | 32.1 | 11.9 | 48.3 | 16.9 | 60.4 | 17.1 | 73.8 | 12.3] 83.1 | 19.1 : . 4 
Praise 31 | 40.5 | 10.9 | 57.9 | 18.7 | 73.0 | 19.3 | 83.8] 17.9 | 88.6 | 39.2 - E 
Blame 28 | 29.3 | 14.3 | 51.1 | 18.7 | 65.2 | 20.9] 84.0] 19.8} 88.8 | 21.2 ; 4 
Control 33 | 44-4 | 14.2 | 61.5 | 22.8 | 76.9 | 20.5 | 85.7 | 21.1 88.4 | 23.5 : 4 
VII-VIII 3 
Boys: 
Praise 31 19.2 | 81.9 | 22.9 4 

Blame 26 14.8 | 81.2 | 17.8 4 : 
Control 26 20.9 | 86.3 | 24.2 
Girls: 

Praise 29 15.7 | 86.4 | 24.9 + 
Blame 32 15.1 | 89.8 | 24.8 . a 
Control 29 19.1 | 86.2 | 19.9 a 
Praise 4 
Blame 30 
Control 43 4 
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is evident in all groups in Figures 4, 7, and 12; while it is manifest for 
Blame and for Control groups in Figure 3; for Praise, in Figure 13; and 
for Control, in Figure 9. This may have been brought about through the 
common incentive that was applied having an over-stimulating effect. 
That is, the subjects were so anxious to succeed that the feeling tone aroused 
actually produced inhibitory action. (4) In no other respects are the curves 
found to have consistently common characteristics. For example, in Figure 
3: Control group leads at S;, holds this over Blame only at S,, then falls 
below both of the other groups from there through S,; whereas, the Praise 
group starts with the lowest score, jumps definitely into the lead at S,, and 
continues to increase this lead to the end. In Figure 13, Blame assumes 
an early lead, loses it to the Praise group, then leads both Praise and 
Control at S,. In Figure 11, Control is superior to the other groups 
throughout all the tests. 

In other cases, groups, as in Figure 7, which show little effect from the 
specific incentives are almost coincident at S,; while in still other cases, as 
in Figure 10, groups which show some effect of the incentives show a 
larger effect at the end. 

In yet other instances, as in Figure 4, groups which are in the lead to 
S; fall below at S.; while groups, as in Figure 6, which have trailed the 
others right along assume the lead on the final test. 

In regarding Figures 9, 10, 11, and 12, one is struck by the appearance 
of the curves. Those for the boys, with each tester, follow more or less 
parallel courses, while those for the girls are quite close together and inter- 
woven. It would appear from this on first analysis that the boys are more 
affected than the girls by the test situation in general, than the specific 
stimuli, in particular. That is, it would seem that the preliminary “ set,” 
or the “ goal-set,” of the boys was more evident, more diversified, and less 
subject to extrinsic motivation than was the case with the girls. The 
girls’ “ set” was more common to all the groups, but was the more readily 
affected by extrinsic factors. However, the closeness of the curves to each 
other for the girls may be giving to their interweaving exaggerated 
impressions of their variability, or affectivity. 


Discussion OF THE S,-S; GAINs 


Grades 
Tables VII, VIII and IX present the raw and corrected gains (S,-S; 
scores) of the various groups arranged according to grade. Table VIII 
shows the significance of the differences for the gains uncorrected. Table 
IX shows the significances corrected for the influence of MA, CA, and 
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Initial Score for grades VII and VIII, high school and college. From 
Table IX, it will be observed that in grade VII with the one tester the 
Praise group shows a significant gain over both the Blame and Control 


groups: P=.0050 and .0009, respectively. With the other tester no gains 
are of statistical significance. 


TABLE VII 
Raw Gains (x and s) 


BLAME 


Vil 
Tester A 60.9 


Tester B 


VIII 
Tester A 


Tester B 


HS. 
Tester A 


Tester B 


COLLEGE 
Tester A 64.7 Q. 64.4 ‘ 10 53-4 


In grade VIII, the gains of the Blame group are significantly different 
with Tester B: P=.o000 for the Blame group over the Control, and 
P=.0002 for Blame over Praise. With the other tester, none of the 
differences is of significance. 

At the high school level, only Blame over Control with Tester B bears 
statistical significance: P=.0060. 

With the college group none of the differences is significant. 


Sexes 


In the foregoing portrayals of the curves and scores of the groups divided 
as to grades, both boys and girls were included in grades VII and VIII. 
Because of the smallness of numbers, it was not considered feasible to 


q 
i 
PRAISE ConTROL a 
in 
GRADE N x s N x s 4 
22.4 58.3 | 23.9 : 
a 47-3 | 25.6 30 48.8 | 23.1 51.0 | 12.5 Lif 4 
| 3 
|| 72.7 20.9 30 73.84 26.2 63.9 | 23.7 a 
mz 48.1 18.3 28 59.5 | 21.7 44.0 | 19.3 = 4 
| 23.4 41 74.5 | 21.5 65.9 | 18.7 
az 61.4 | 26.3 30 71.0 | 20.1 56.6 | 21.2 5 q 
25.2 J 
é 2 + 
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separate the sexes as to grade. Instead, the sexes were separated by grouping 
together both seventh and eighth grades (designated as grade VII-VIII 
combined). Tables I and II (pp. 23 and 24) show the similarities of the 


TABLE VIII 


SIGNIFICANCE OF DiFFERENCE IN Gains, UNCORRECTED FoR INFLUENCE oF CA, MA, 
anv IS. Graves VII, VIII, H.S. 


HS. 
P vs. C 


B vs. C 


P vs. B 


COLLEGE 
P vs. C 


B vs. C 


P vs. B -9761 


two years when separated and when combined with respect to MA, CA, 
and Initial Score. 

Tables X, XI, and XII show the raw gains and significance of the 
differences in scores for grade VII-VIII combined with the subjects 
separated as to sex. In Table XI, it will be noticed that only in the case 
of boys with Tester A is Praise significant over both Control and Blame: 


Tester A Tester B 
vil 
P vs. C 20.1 3.26 -OO11 -3.7 0.68 - 4965 
B vs. C 2.6 0.45 -6527 2.2 0.44 - 6599 
‘ P vs. B 17.5 2.88 -0040 -1.5 0.24 
vill 
P vs. C 8.8 1.60 - 1096 4-1 0.86 3898 
B vs. C 7.2 1.13 -2585 15.5 2.92 .0035 
P vs. B 1.6 0.27 .7872 -11.4 2.17 -0300 
|| 5.2 0.75 -4533 4-8 0.92 -3576 
8.6 1.68 .0930 14.4 2.96 -0031 
0.52 -6031 1.75 .o8o1 
II.3 -2585 
11.0 -2801 
_ 
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P=.0000 and .0042, respectively; while only Blame over Control shows a 
significant difference with the girls for Tester B: P=.o006. 
In Table XII, the differences in corrected gains are matched as to the 


TABLE 1X 


SIGNIFICANCE oF DIFFERENCE IN GAINS, CORRECTED FOR INFLUENCE OF CA, MA, anp IS. 
Grapes VII, VIII, H.S. anp CoLiece 


A Tester B 


GRADES k 


. 1389 


. 8259 


. 5823 


. 1389 


5484 


COLLEGE 
P vs. C 15.20 : .0854 


B vs. C 11.60 . 1868 


P vs. B 3.60 . 6892 


sexes for the combined grade VII-VIII. In only one instance, that of 
Control girls over Control boys, is there apparent a significant difference 
between the sexes: P=.0285. 

The general lack of statistically reliable differences in gains between the 
sexes may be a partial answer at least to the observation made on page 36 
of the apparent wide dissimilarity in the learning curves for boys and girls. 


P Dirr. P 4 

vu 

P vs. C 19.9 3.31 0009 —2.20 0.42 -6745 

B vs. C 2,5 0.26 -7949 -1.40 0.29 .7718 4 

P vs. B 16.9 2.81 -0.60 0.12 

vit | 

P vs. C 8.74 1.48 5.80 1.23 .2187 } a 

B vs. C 7.50 1.25 23.10 4.84 .0000 ie “a 

P vs. B 1.34 0.22 -17.00 3-73 .0002 

HS. > 

P vs. C 3.70 0.55 4-20 0.64 .5222 “4 

as 4 

B vs. C 7.40 1.48 rd 13.10 2.75 .0060 4 

P vs. B —3.44 0.60 -7.20 1.28 2005 ae 

we 
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The Different Testers 
If the various initial scores be recalled, or reference made to Tables | 


and II and the various graphs, it will be observed that in every instance 


TABLE X 


Raw Gains. Boys anp Girts. Tester A anp Tester B DirFeRENTIATED. 
Grapes VII-VIII Comsinep 


ConTROL 


| 
| 


51.5 
(26) | 


TABLE XI 


SicNiFICANCE oF DirreRENcE IN GaINs, CorrEcTED For INFLUENCE oF CA, MA, 
anpD IS. Boys anp Girts. Grapes VII-VIII Comsinep 


Tester A Tester B 


Dirr. 


+5093 
. 1802 


4965 


P vs.C 28 | 35 .6818 | 29 | 29 7.00 -2137 


B vs. C 30 | 35 3-41 3524 | 32 | 29 | 19.40, .0006 


P vs. B 28 | 30 —0.65 ‘ .8650 | 29 | 32 | -6.20 ‘ -2713 


the subjects made a larger initial score (averaging approximately ten points 
at each level) with Tester B than with Tester A. Tables XIII through 
XVIII show the various differences in raw and corrected scores and gains 
between the subjects of Tester A and Tester B. Table XVIII is a com- 
posite table of gains (raw scores). 


é 
Praise BLAME 
| 
TESTER B | G B G B G 
a 
7 A 77-3 73.6 60.9 71.6 55-7 65.8 
i n, (29) (28) (29) (30) (28) (35) 
B 49.0 47-6 45-4 
a, (32) (29) (29) 
Boys 
; P vs.C 29 | 28 26.8 1.35 .0000 | 32 | 26 3.30 | 0.66 a 
/Bvs.C 29 | 28 9.0 1.47 -1416 | 26 | 26 6.80 1.34 
} P vs. B 29 | 29 9.9 2.86 .0042 | 31 | 26] -3.50 | 0.68 | | 
Girls 
\ 
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Examination of Table XIII will show that in every instance but two the 
initial scores of Tester B are significantly higher than those of Tester A. 
The two cases that are exceptions are in grade VIII, Blame, and grade 
VII-VIII, girls, Blame, where P=.3371 and .1010, respectively. 

It can be seen from Table XV that the subjects of Tester A gained 
substantially and significantly over those of Tester B. In only two instances 
could the difference be attributable to errors of random sampling. In 
these two cases, P was equal to .3271 for high school, Blame, and equal 


TABLE XII 


SIGNIFICANCE OF DIFFERENCE OF GAINs, CorrECTED FoR INFLUENCE oF CA, MA, 
IS. Boys vs. Girts. Graves VII-VIII Comsinev 


Tester B 


P vs. P 29 | 28 6.9 | 1.16 -2460 | 31 | 29 3-4 | 0.67 +5029 
B vs. B 29 | 30 -9.0 | 1.50 -1336 | 26 | 32 | -3.4 | 0.66 -5093 


Cvs.C © | 28] 35 | -12.5 | 2.19 | .0285 | 26 | 29 4-1 | 0.76 | .4473 


to .0488 for high school, Control. In this latter case the null hypothesis 
approaches very closely to being rejected. 

The effect of the correction may be observed where Tester A and 
Tester B are compared. Looking at Table XIV, uncorrected gains, it is 
observed that in four instances out of the nine significant differences are 
apparent. In Table XV, where the gains have been corrected for the 
effect of MA, CA, and Initial Score, seven of the nine differences are now 
of significance. In all instances the differences were in favor of Tester A. 

Differentiating the sexes, grades VII-VIII combined, it will be observed 
from Table XVII that again significant differences exist between testers, 
with the increase on the side of Tester A. Here, in only one case, Control 
boys, can the difference be ascribed to chance: P=.o819. Yet this is quite 
close to being acceptable and is certainly in the trend with the other groups. 

The question arises as to the possible causes of the difference in gains in 
favor of Tester A over Tester B. Is this difference attributable to the lower 
scores on S; for Tester A, thus making the difference in gains a function 
of S;, or ‘are the scores on S, higher for Tester A than for Tester B? 
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In Table XVIII, which is a composite of several of the preceding tables, 
it can be seen that in nine cases (groups) out of eleven the significant 
difference in gains is accompanied by a significant difference in the S; 


TABLE XIV 


SIGNIFICANCE OF D1FFERENCE IN GAINs, UNCORRECTED FoR INFLUENCE oF CA, MA, 
ano IS. Tester A vs. Tester B. Grapes VII, VIII, anv H.S. 


Tester A vs. TesTER B 


9.7 R -1676 


3-5 - 4839 


.C 9.3 .0658 


score; but in only two cases out of the eleven is this difference accompanied 
by a significant difference in the S, score. Schematized, this appears thus: 


SIGNIFICANT D1FFERENCE IN GAINS ACCOMPANIED BY: 


Sic. DiFr. IN Se Non-Sic. Dirr. 1n Se 


Sig. Diff. in S1 
Non-Sig. Diff. in 


Totals 


| 
‘ 
Group Dirr. k A B 2 
vil 4 
| P vs. P 31.1 4.82 . 0000 26 29 | 4 
B vs. B 12.1 2.05 .0404 29 30 4 
C vs. C 7.3 1.43 -1527 30 22 
vill it 
P vs. P 4-95 .0000 31 31 
a 
B vs. B 11.6 1.84 .0658 30 28 I 4 
. C vs. C 19.9 3-73 .0002 33 33 | a 
HS. 4 
P vs. P 14 41 a 
i B vs 41 30 : 3 
Cvs 23 43 4 
Torats = 
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In Table XVIII the significant differences all have been italicized. The 
asterisk indicates that when the differences have been corrected for the 


influence of MA, CA, and Initial Score, they are significant (cf. Tables XV 
and XVII). 


TABLE XV 


SIGNIFICANCE OF DiFFERENCE IN GAINS, CORRECTED FOR INFLUENCE oF CA, MA, ano JS. 
Tester A vs. Tester B. Graves VII, VIII, anp HS. 


Tester A vs. TESTER B 


Recessional 


In only 31 individual cases did subjects fail to make a better score on 
succeeding tests up to S,. A few of these, evidently from examination of 
their papers, were ones who had misunderstood directions and had begun 
incorrectly, but later discovered their mistake and had begun over at S, or 
S.. However, at S, there was noted in many instances in individual cases 
an actual decrease in score from that of S;. In all groups between S, and 
S; only 31 subjects, or 5.4 per cent, made a poorer score on succeeding 
tests than on the one immediately prior. On S, for all groups, however, 
a total of 98 subjects or 17+ per cent, did more poorly than on S3. 


é 
N 

Group Dirr. k | P A B 
vu 
if P vs. P 35.2 6.07 .0000 26 29 
‘| B vs. B 17.5 3.13 .0017 29 30 
¥ C vs. C 13.5 2.23 +0257 30 22 
i vin 
P vs. P 28.6 5-15 31 31 
4 B vs. B 11.9 2.14 0324 30 28 
if C vs. C 23.0 4-31 +0000 33 33 
S 
HS. 
iq P vs. P 14.3 2.26 -0238 14 41 = 
i B vs. B 4:9 0.96 -3271 41 30 
if C vs. C 10.3 1.97 .0483 23 43 
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TABLE XVI 


SIGNIFICANCE OF DIFFERENCE IN GAINS, UNCORRECTED FOR INFLUENCE oF CA, MA, 
anp IS. Tester A vs. Tester B. Boys anp Giris. Grapes VII-VIII Combinep 


N 


SIGNIFICANCE OF DiFFERENCE 1N GaINs, CorRECTED FoR INFLUENCE oF CA, MA, ano IS. 
Tester A vs. Tester B. Boys anv Giris. Grapes VII-VIII Comainep 


TABLE XVII 


N 


| 
“3 
Group Dirr. P 
A 
Boys a 
P vs. P 29 31 28.3 5.55 .0000 j — 
B vs. B 29 26 9-4 1.62 .10§2 f _ 
a 
C vs. C 28 26 | ay 1.38 . 1676 (| 4 
Girls | 
P vs. P 28 29 26.0 4-33 .0000 | 4 
- 
B vs. B 30 32 15.4 2.37 .0178 ' q 
C vs. C 35 29 | 20.4 3.92 
' 
4 
Group Dirr. k P 
= 
Boys 
P vs. P 29 31 32.5 6.33 - 
j 
f B vs. B 29 26 11.3 2.12 .0340 z= 
n C vs. C 28 26 9-3 1.74 .0819 4 a 
7 
4 P vs. P 28 29 31.6 5.44 .0000 ia * 
§ B vs. B 30 32 18.0 3.37 .0008 a 
C vs. C 35 29 33.6 8.29 .0000 ' a 
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Subjectively, it was observed that the announcement of the reward of 
one dollar brought not a few “Ah’s” and “Oh’s”, and other comments, 
which were followed in many instances by quite feverish activity. 

In order to determine whether or not the percentages of those receding 
on the last test was significantly larger than those receding on all tests up 
to the last, the probable error of difference between the two percentages 
was determined. The type formula of PE ay, = PE,)?+ (PE)? was 
employed, the PE in each case being a derivation of the proportion probable 


error, PE, = 6745 
The difference in percentage was 11.67, and the probable error of this 
difference was 1.23, showing that there is a significantly larger proportion 


receding on the last test than on all the other tests combined up to the 
last test. 


The Correction 


Returning to our earlier discussion on the question of equating for 
various variables, some interesting phenomena might be noted. What may 
have been lost had actual physical equating of the groups been effected, 
is, of course, almost wholly conjectural. On the other hand, the fact cannot 
be evaded that any estimate of a parameter, whatever its refinements may 
be, still remains only an estimate. In this experiment, all the cases have 
been retained and a statistical procedure has been employed to correct for 
the influences of MA, CA, and Initial Score. It should follow that inquiry 
be made as to the effects of the corrections or, perhaps more pertinently, 
the effects of the variables of MA, CA, and Initial Score. 

Tables VIII, IX (pp. 38-39), XIV, XV, XVI, and XVII (pp. 43-45) show 
the gains uncorrected and the gains corrected for the influence of MA, CA, 
and Initial Score. It will be noted that in some instances the disparity 
between the two gains seems of appreciable magnitude. However, further 
analysis shows that P values for uncorrected gains have been altered signifi- 
cantly with correcting in only a few instances. These several cases are to 
be found in cases of Tester A vs. Tester B. From Table XIV it will be 
seen that for grade VII in the case of the Controls, P was equal to .1527, 
while in Table XV, where the gains had been corrected, P becomes .0257. 
From the same tables, there is similarly, for grade VIII, a Blame change 
for P from .0658 to .0324; and for high school, Praise vs. Praise, a change 
from .1676 to .0238. 

Where the sexes and the different testers are compared, Tables XVI and 
XVII, for boys, Blame, P changes from .1052 to .0340. 
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What seems apparent from analysis of the data in this study is that there 
was little or negligible effect of the variables MA, CA, and Initial Score; 
and that any equating process could have been dispensed with without in 
any way vitiating the results. Indeed, this is foreshadowed by the corre- 
lations obtained at the outset (Tables III and IV, pp. 31 and 32). 


Physiological 


Figure x3 shows graphically the x scores for pulse rate of the three 
college groups on the separate tests. There were no significant differences 
from test to test for any of the groups; but the curves, however, show 
several traits in common. 

There is a rise from the preliminary period, S,, to S,, a slight decrease 
in rate at S,, then a gradual rise with a more pronounced incidence 
from S, to S,, and a final falling off at Sp, which approaches the point 
reached at S,. 

For the breathing ratio, Figure 16 portrays graphically the x scores of 
the groups on the separate tests. This was the “I-factor” as described by 
Woodworth (83). As was the case with the pulse rate, here again there 
appear no significant differences between groups, but the trend of the curves 
is interesting. Contrary to the pulse rate, there is a tendency for the 
inspiration-expiration ratio to lessen gradually to Sg; but similarly, as was 
the case with the pulse rate, after the announcement of the reward at S, 
(that is, just before the beginning of the final test), there is a distinct rise 
to S,., followed by subsidence to Sp. 

The data on blood pressure did not lend themselves to definite or quanti- 
tative handling and shall not be presented here. However, certain trends 
did seem to be manifest, as —1~_— where there was a drop at “ Stop,” 
and an increase again at “Begin.” This was a fairly general condition to 
prevail, but not always regular. Often the indication of a change was 
slight or similar to the following: —____- *. 


1 For greater ease in calculating, this procedure was modified a trifle here by multiplying 
the factor by 100. 

2 The method employed of selecting an arbitrary pressure, as was brought out earlier, gives 
neither a systolic nor a diastolic pressure, but will indicate a change in the circulatory system. 
This was originally brought out by J. A. Larson, Lying and Its Detection, 1932; Christian A. 
Ruckmick, The Psychology of Feeling and Emotion (New York: McGraw Hill Co., 1936), 
pp. 291 and 309, speaks of it as an applicable technique. Others, as Harold V. Gaskill, “ The 
Objective Measurement of Emotional Reactions,” Genet. Psychol. Monogr., Vol. XIV, No. 3, 
1935, Pp. 204, have employed this principle using pressure reducers. 
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. (& of group) 


I-factor 


Fic. 16. Averace I/E Ratio (I-racror). MEN. TEsTER A 


‘There was attempted also the measuring of the breathing curves on a 
linear basis, but the similarity of the samplings did not seem to warrant a 
continuation of this procedure. 
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CHAPTER IV 


SUMMARY AND CONCLUSIONS 


Summary 


The purpose of this investigation is threefold: to seek an answer to the 
question of the effects of Praise and Blame as incentives in learning; to 
observe the effects of different testers; and to determine the attendant 
emotional aspects, as evidenced from physiological changes recorded on a 
kymograph. 

From the data gathered in the present study, it is practically certain that: 


1. Praise with Tester A is more effective with grade VII than is either 
Blame or no incentive. 

2. With Tester B, Blame is more effective in grade VIII than either Praise 
or no incentive. 

3. With Tester B, Blame at the high school level is more effective than 
either of the other incentives. 

4. With grade VII-VIII combined, boys and girls differentiated, Praise is 
more effective than Blame or no incentive for boys with Tester A. 

5. With grade VII-VIII combined, boys and girls differentiated, Blame is 
more effective for the girls with Tester B. 

6. A difference exists between the initial scores of the subjects of Tester A 
and Tester B, the larger scores being made by the subjects of Tester B. 

7. A difference in gains exists between the subjects of Tester A and those 
of Tester B, the greater gains being in favor of Tester A. 


It can also be concluded that: 


1. With sexes compared, incentive group vs. incentive group, the differences 
between boys and girls are not of statistical significance. 

. There is a continued increase in score of the various groups from test 
to test up to the final test. 

. There is a statistically significant suggestion that a loss on the final test 
from that of the test immediately preceding is not due to chance factors. 

. There appears a tendency for Praise to be more effective with Tester A, 
and Blame for Tester B. 

. There appear no physiological patterns to distinguish one form of 
incentive from the other. 

. The curves for pulse rate for Praise, Blame, and Control groups follow 
the same pattern. There is the tendency for the pulse rate to increase 
slightly from test to test with a marked rise from S, to S,, followed by 
a drop to Sp. 
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7. The pattern of the curves for the I/E ratio (I-factor) is closely the same 
for the three groups. There is a tendency for the ratio to decrease from 
test to test, with a distinct rise from S, to S,, followed by a drop to Sy. 


Conclusions 


This investigation was not concerned with testing the effects of Praise or 
Blame as psychological opposites, but rather their effectiveness, per se, in 
producing a change in the learning situation. The statements of Praise 
or Reproof were administered rather much as would be done in any 
routine classroom situation of like nature; but with the precaution that no 
external factors of either tester would bias the results. The same clothes 
were worn on each testing occasion and the directions and comments were 
rehearsed so as to be delivered with identical intonation and inflection and 
without extraneous clues (vide p. 27). 

From the data as they are found, the conclusion is arrived at that neither 
Praise nor Blame can be singled out as being more effective, the one over 
the other. On the one hand Tester A found, in grade VII-VIII combined, 
Praise to be more effective for boys, but found no differences for girls; on 
the other hand, in grade VII-VIII combined, girls (but not boys), and in 
high school, boys, Tester B found Blame to be the more effective. Tester A 
found no differences at the high school or college levels; and in no case did 
Tester A find Blame more effective. Similarly, in no case did Tester B 
find Praise more effective. 

It is more than just apparent that the test situation, and specifically the 
tester, plays a deciding réle in the results obtained. In every group but 
two the initial scores made by Tester B were significantly higher than those 
made by Tester A. In the larger majority of instances the gains made by 
Tester A were significantly larger than those for Tester B. 

It must be concluded also that the nature of the stimulus is most impor- 
tant; for if it be non-effective on the one hand, it may be so intense as to 
inhibit improvement on the other. Which is to say, if there appears no 
evidence to support either Praise or Blame as more effective incentives to 
learning, there does seem to be evidence to point to the effect of overstimu- 
lation lessening improvement. Up to the final test, all tests included, only 
five per cent of the subjects made a poorer score on a succeeding test, but 
on the final (reward) test alone, 17 per cent lost ground. 
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Implications for Education 


It would seem from these data that in standardizing test procedure that 
the who is testing should be of some concern. The data of this study have 
shown that without exception the subjects made a higher initial score with 
one tester than with the other. The differences were statistically significant 
in all but two groups. These differences developed in the face of strictly 
standardized test procedure, rigid adherence thereto, and control of the 
objective features of each tester. 

If standardized test scores are to be meaningful, it would appear that 
the following of standard test procedure alone is not sufficient to insure 
comparable results. What is indicated is a calibration of test scores in terms 
of the tester, or a calibration of testers. 

Differences as large as were found here might have serious consequences 
to individual pupils and to a school system. If a system employs an 
X, Y, and Z division of pupils, for instance, it is quite conceivable then, 
that pupils will be misplaced; too high, perhaps, in some instances, and 
too low in others. The problem of the proper placement of pupils and 
the harmful effects of misplacement have received too much attention 
elsewhere to warrant here more than an indication of a possible cause of 
maladjustment in school. 

Equally vital appears the question of the “ best” incentive to be employed 
in order to bring out the most in a pupil, in order to get him to perform 
maximally. From the data of this study, one can find no evidence favoring 
either Praise or Blame as fulfilling this rdle. The evidence here submitted, 
certainly, does not substantiate the popular belief of the necessity and 
efficacy of Praise as a prime motivator. To employ Praise in the belief or 
in the confidence that a pupil will be made thus to react, or to produce, 


maximally, is to proceed upon false assumptions, is to build on a foundation 
of clay. 
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